SMA Quick Review

Simple Moving Average

We've already shown how to create a simple moving average, for a quick review:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

airline = pd.read_csv('airline_passengers.csv', index_col="Month")

airline['6-month-SMA']=airline['Thousands of Passengers'].rolling(window=6).mean()
airline['12-month-SMA']=airline['Thousands of Passengers'].rolling(window=12).mean()
In [2]:
airline.head()
Out[2]:
Thousands of Passengers 6-month-SMA 12-month-SMA
Month
1949-01 112.0 NaN NaN
1949-02 118.0 NaN NaN
1949-03 132.0 NaN NaN
1949-04 129.0 NaN NaN
1949-05 121.0 NaN NaN
In [3]:
airline.plot(figsize=(10,8))
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x6f5ddd0>

EWMA

Exponentially-weighted moving average

We just showed how to calculate the SMA based on some window. However, basic SMA has some "weaknesses".

  • Smaller windows will lead to more noise, rather than signal
  • It will always lag by the size of the window
  • It will never reach to full peak or valley of the data due to the averaging.
  • Does not really inform you about possible future behaviour, all it really does is describe trends in your data.
  • Extreme historical values can skew your SMA significantly

To help fix some of these issues, we can use an EWMA (Exponentially-weighted moving average).

EWMA will allow us to reduce the lag effect from SMA and it will put more weight on values that occured more recently (by applying more weight to the more recent values, thus the name). The amount of weight applied to the most recent values will depend on the actual parameters used in the EWMA and the number of periods given a window size. Full details on Mathematics behind this can be found here. Here is the shorter version of the explanation behind EWMA.

The formula for EWMA is:

$ y_t = \frac{\sum\limits_{i=0}^t w_i x_{t-i}}{\sum\limits_{i=0}^t w_i} $

Where x_t is the input value, w_i is the applied weight (Note how it can change from i=0 to t), and y_t is the output.

Now the question is, how to we define the weight term w_i ?

This depends on the adjust parameter you provide to the .ewm() method.

When adjust is True (default), weighted averages are calculated using weights:

$y_t = \frac{xt + (1 - \alpha)x{t-1} + (1 - \alpha)^2 x_{t-2} + ...

  • (1 - \alpha)^t x_{0}}{1 + (1 - \alpha) + (1 - \alpha)^2 + ...
  • (1 - \alpha)^t}$

When adjust=False is specified, moving averages are calculated as:

$\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,\end{split}$

which is equivalent to using weights:

\begin{split}w_i = \begin{cases} \alpha (1 - \alpha)^i & \text{if } i < t \\ (1 - \alpha)^i & \text{if } i = t. \end{cases}\end{split}

When adjust=True we have y0=x0 and from the last representation above we have yt=αxt+(1−α)yt−1, therefore there is an assumption that x0 is not an ordinary value but rather an exponentially weighted moment of the infinite series up to that point.

One must have 0<α≤1, and while since version 0.18.0 it has been possible to pass α directly, it’s often easier to think about either the span, center of mass (com) or half-life of an EW moment:

\begin{split}\alpha = \begin{cases} \frac{2}{s + 1}, & \text{for span}\ s \geq 1\\ \frac{1}{1 + c}, & \text{for center of mass}\ c \geq 0\\ 1 - \exp^{\frac{\log 0.5}{h}}, & \text{for half-life}\ h > 0 \end{cases}\end{split}

  • Span corresponds to what is commonly called an N-day EW moving average.
  • Center of mass has a more physical interpretation and can be thought of in terms of span: c=(s−1)/2
  • Half-life is the period of time for the exponential weight to reduce to one half.
  • Alpha specifies the smoothing factor directly.

Imports

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Get Data

In [5]:
airline = pd.read_csv('airline_passengers.csv', index_col="Month")

It's not a DateTime Index. It's a string index.

In [6]:
airline.index
Out[6]:
Index(['1949-01', '1949-02', '1949-03', '1949-04', '1949-05', '1949-06',
       '1949-07', '1949-08', '1949-09', '1949-10',
       ...
       '1960-04', '1960-05', '1960-06', '1960-07', '1960-08', '1960-09',
       '1960-10', '1960-11', '1960-12',
       'International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60'],
      dtype='object', name='Month', length=145)

Create a DateTimeIndex

There are missing data points in our dataset as well. Use dropna() to fix it first!

In [7]:
airline.dropna(inplace=True)
airline.index = pd.to_datetime(airline.index)
In [8]:
airline.head()
Out[8]:
Thousands of Passengers
Month
1949-01-01 112.0
1949-02-01 118.0
1949-03-01 132.0
1949-04-01 129.0
1949-05-01 121.0

It's now a DateTime index

In [9]:
airline.index
Out[9]:
DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01',
               '1949-05-01', '1949-06-01', '1949-07-01', '1949-08-01',
               '1949-09-01', '1949-10-01',
               ...
               '1960-03-01', '1960-04-01', '1960-05-01', '1960-06-01',
               '1960-07-01', '1960-08-01', '1960-09-01', '1960-10-01',
               '1960-11-01', '1960-12-01'],
              dtype='datetime64[ns]', name='Month', length=144, freq=None)

Calculate EWMA

Use the span approach to calcuate EWMA. For monthly data, pass the parameter span=12 to ewm().mean() method

In [10]:
airline['EWMA12'] = airline['Thousands of Passengers'].ewm(span=12).mean()
In [11]:
airline[['Thousands of Passengers','EWMA12']].plot()
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x8e504f0>

The behavior at the beginning is different from the behavior at the end. The seasonality trend is more clear towards the end points than the beginning points. This is because we weighted the points closer to the present time heavier than the older points.